Introduction to Distance sampling with inlabru
School of Mathematics and Statistics, University of Glasgow
November 13, 2025
inlabru?inlabru-friendly?inlabru?inlabru?The short answer:
INLA is a fast method to do Bayesian inference with latent Gaussian models and
inlabruis anR-package that implements this method with a flexible and simple interface.
The (much) longer answer:
Website-tutorials
Discussion forums
Books
Blangiardo, M., & Cameletti, M. (2015). Spatial and spatio-temporal Bayesian models with R-INLA. John Wiley & Sons.
Gómez-Rubio, V. (2020). Bayesian inference with INLA. Chapman and Hall/CRC.
Krainski, E., Gómez-Rubio, V., Bakka, H., Lenzi, A., Castro-Camilo, D., Simpson, D., … & Rue, H. (2018). Advanced spatial modeling with stochastic partial differential equations using R and INLA. Chapman and Hall/CRC.
Wang, X., Yue, Y. R., & Faraway, J. J. (2018). Bayesian regression modeling with INLA. Chapman and Hall/CRC.
inlabru?Distance sampling is a family of related methods for estimating the abundance and spatial distribution of wild populations.
Distance sampling is based on the idea that animals further away from observers are harder to detect than animals that are nearer.
This idea is implemented in the model as a detection function that depends on distance.
Coupling distance sampling data with spatial modelling allows maps of spatially varying density to be produced.
Coupling distance sampling data with spatial modelling allows maps of spatially varying density to be produced.
Traditionally, this is achieved in a two-stages approach by (i) using a detectability point estimates to create an offset vector to (ii) use within GLM or GAM for count response data.
This requires binning the data into counts based on some discretisation of space.
Coupling distance sampling data with spatial modelling allows maps of spatially varying density to be produced.
Traditionally, this is achieved in a two-stages approach by (i) using a detectability point estimates to create an offset vector to (ii) use within GLM or GAM for count response data.
A major downside to this approach is the propagation of uncertainty from the detection model to the second-stage spatial model.
Many of the ecological and environmental processes of interest can be represented by a spatial point process or can be view as an aggregation of one.
Consider a fixed geographical region \(A\).
The set of locations at which events occur are denoted by \(\mathbf{s} = (\mathbf{s}_1, \ldots, \mathbf{s}_n)\).
A point process is defined by a random variable \(N(A)\) that counts the number of events in every subset of the region of interest \(A\).
Our primary interest is in measuring where events occur, so the locations are our data.
The simplest version of a point process model is the homogeneous Poisson process (HPP).
The likelihood of a point pattern \(\mathbf{y} = \left[ \mathbf{s}_1, \ldots, \mathbf{s}_n \right]^\intercal\) distributed as a HPP with intensity \(\lambda\) and observation window \(\Omega\) is
\[ p(\mathbf{y} | \lambda) \propto \lambda^n e^{ \left( - |\Omega| \lambda \right)} , \]
\(|\Omega|\) is the size of the observation window.
\(\lambda\) is the expected number of points per unit area.
\(|\Omega|\lambda\) the total expected number of points in the observation window.
The inhomogeneous Poisson process has a spatially varying intensity \(\lambda(\mathbf{s})\).
The likelihood in this case is
\[ p(\mathbf{y} | \lambda) \propto \exp \left( -\int_\Omega \lambda(\mathbf{s}) \mathrm{d}\mathbf{s} \right) \prod_{i=1}^n \lambda(\mathbf{s}_i). \]
If the case of an HPP the integral in the likelihood can easily be computed as \(\int_\Omega \lambda(\mathbf{s}) \mathrm{d}\mathbf{s} =|\Omega|\lambda\)
For IPP, integral in the likelihood has to be approximated numerically as a weighted sum.
The integral in is approximated as \(\sum_{j=1}^J w_j \lambda(\mathbf{s}_j)\)
\(w_j\) are the integration weights
\(\mathbf{s}_j\) are the quadrature locations.
This serves two purposes:
Approximates the integral
re-write the inhomogeneous Poisson process likelihood as a regular Poisson likelihood.
The idea behind the trick to rewrite the approximate likelihood is to introduce a dummy vector \(\mathbf{z}\) and an integration weights vector \(\mathbf{w}\) of length \(J + n\)
\[\mathbf{z} = \left[\underbrace{0_1, \ldots,0_J}_\text{quadrature locations}, \underbrace{1_1, \ldots ,1_n}_{\text{data points}} \right]^\intercal\]
\[\mathbf{w} = \left[ \underbrace{w_1, \ldots, w_J}_\text{quadrature locations}, \underbrace{0_1, \ldots, 0_n}_\text{data points} \right]^\intercal\]
Then the approximate likelihood can be written as
\[ \begin{aligned} p(\mathbf{z} | \lambda) &\propto \prod_{i=1}^{J + n} \eta_i^{z_i} \exp\left(-w_i \eta_i \right) \\ \eta_i &= \log\lambda(\mathbf{s}_i) = \mathbf{x}(s)'\beta \end{aligned} \]
This is similar to a product of Poisson distributions with means \(\eta_i\), exposures \(w_i\) and observations \(z_i\).
This is the basis for the implementation of Cox process models in inlabru, which can be specified using family = "cp".
\[ \log~\lambda(s)= \mathbf{x}(s)'\beta + \xi(s) \]
The events are then assumed to be independent given \(\xi(s)\) - a GMRF with Matérn covariance.
inlabru has implemented some integration schemes that are especially well suited to integrating the intensity in models with an SPDE effect.
If we have a process that is occurring everywhere in space, it is natural to try to model it using some sort of function.
\[ \mathbf{z} = (z(\mathbf{s}_1),\ldots,z(\mathbf{s}_m)) \sim \mathcal{N}(0,\Sigma) \]
where \(\Sigma_{ij} = \mathrm{Cov}(z(\mathbf{s}_i),z(\mathbf{s}_j))\) is a dense \(m \times m\) matrix.
Stationary random fields
A GRF is stationary if:
has mean zero.
the covariance between two points depends only on the distance and direction between those points.
It is isotropic if the covariance only depends on the distance between the points.
The goal: approximate the GRF using a triangulated mesh via the so-called SPDE approach.
The SPDE approach represents the continuous spatial process as a discretely indexed Gaussian Markov Random Field (GMRF)
\[ c_{\nu}(d;\sigma,\rho) = \sigma^2\frac{2^{1-\nu}}{\Gamma(\nu)}\left(\sqrt{8\nu}\frac{d}{\rho}\right)^{\nu}K_{\nu}\left(\sqrt{8\nu}\frac{d}{\rho}\right) \]
Penalized Complexity (PC) priors proposed by Simpson et al. (2017) allow us to control the amount of spatial smoothing and avoid overfitting.
F. Lindgren, H. Rue, and J. Lindström. An explicit link between Gaussian fields and Gaussian Markov random fields: The SPDE approach (with discussion). In: Journal of the Royal Statistical Society, Series B 73.4 (2011), pp. 423–498.
H. Bakka, H. Rue, G. A. Fuglstad, A. Riebler, D. Bolin, J. Illian, E. Krainski, D. Simpson, and F. Lindgren. Spatial modelling with R-INLA: A review. In: WIREs Computational Statistics 10:e1443.6 (2018). (Invited extended review). DOI: 10.1002/wics.1443.
E. T. Krainski, V. Gómez-Rubio, H. Bakka, A. Lenzi, D. Castro-Camilio, D. Simpson, F. Lindgren, and H. Rue. Advanced Spatial Modeling with Stochastic Partial Differential Equations using R and INLA. Github version . CRC press, Dec. 20
We call spatial Markov models defined on a mesh SPDE models.
SPDE models have 3 parts
A mesh
A range parameter \(\kappa\)
A precision parameter \(\tau\)
We use the SPDE effect to model the intensity of a point process that represents the locations of animal sightings.
Often such sightings are made by observers who cannot detect all the animals
To accurately estimate abundance, we require an estimate of the number of animals that remained undetected.
The LGCP is a flexible approach that can include spatial covariates to model the mean intensity and a mean-zero spatially structured random effect to account for unexplained heterogeneity not captured by the covariates.
To account for the imperfect detection of points we specify a thinning probability function \[g(s) = \mathbb{P}(\text{a point at s is detected}|\text{a point is at s})\]
A key property of LGCP is that a realisation of a point process with intensity \(\lambda(s)\) that is thinned by probability function \(g(s)\), follows also a LGCP with intensity:
\[ \underbrace{\tilde{\lambda}(s)}_{\text{observed process}} = \underbrace{\lambda(s)}_{\text{true process}} \times \underbrace{g(s)}_{\text{thinning probability}} \]
Lets visualize this on 1D: Intensity function with points
Intensity (density) function with points and transect locations
Detection function \(\color{red}{g(s)}\)
Here \(\color{red}{g(s) =1}\) on the transects (at x = 10,30 and 50).
The detection function describes the probability \(\color{red}{p(s)}\) that an point is detected
Observations are from a thinned Poisson process with intensity \(\lambda(s) \color{red}{p(s)}\)
Half-normal: \(g(\mathbf{s}|\sigma) = \exp(-0.5 (d(\mathbf{s})/\sigma)^2)\)
Hazard-rate :\(g(\mathbf{s}|\sigma) = 1 - \exp(-(d(\mathbf{s})/\sigma)^{-1})\)
\[ \pi(\mathbf{s_1},\ldots,\mathbf{s_m}) = \exp\left( |\Omega| - \int_{\mathbf{s}\in\Omega}\lambda(s)g(s)\text{d}s \right) \prod_{i=1}^m \lambda(\mathbf{s}_i)g(\mathbf{s}_i) \]
To make \(g(s)\) and \(\lambda(s)\) identifiable, we assume intensity is constant with respect to distance from the observer.
If the strips width ( \(2W\) ) is narrow compared to study region (\(\Omega\)) we can treat them as lines.
Define the Poisson process likelihood along the kronecker spaces (line \(\times\) distance)
Accounting for imperfect detection the thinned Poisson process model on (space, distance) along the transects becomes:
\[ \begin{aligned} \log \tilde{\lambda}(s,\text{distance}) &= \overbrace{\mathbf{x}'\beta + \xi(s)}^{\log \lambda(s)} + \log \mathbb{P}(\text{detection at }s|\text{distance},\sigma) + \log(2)\\ \mathbb{P}(\text{detection}) &=1-\exp\left(-\frac{\sigma}{\text{distance}}\right) \end{aligned} \]
Here \(\log 2\) accounts for the two-sided detection.
Typically \(\mathbb{P}(distance)\) is a non-linear function, that is where inlabru can help via a Fixed point iteration scheme (further details available in this vignette)
we define \(\log (\sigma)\) as a latent Gaussian variable and iteratively linearise it.
In the next example, we will explore data from a combination of several NOAA shipboard surveys conducted on pan-tropical spotted dolphins in the Gulf of Mexico.
A total of 47 observations of groups of dolphins were detected. The group size was recorded, as well as the Beaufort sea state at the time of the observation.
Transect width is 16 km, i.e. maximal detection distance 8 km (transect half-width 8 km).
First, we need to create the mesh used to approximate the random field. We can either:
fm_mesh_2d and fm_nonconvex_hull functions from the fmesher package:max.edge for maximum triangle edge lengthscutoff to avoid overly small triangles in clustered areasFirst, we need to create the mesh used to approximate the random field. We can either:
sf boundary and specify this directly into the mesh construction via the fm_mesh_2d functionmax.edge for maximum triangle edge lengthscutoff to avoid overly small triangles in clustered areasAll random field models need to be discretised for practical calculations.
The SPDE models were developed to provide a consistent model definition across a range of discretisations.
We use finite element methods with local, piecewise linear basis functions defined on a triangulation of a region of space containing the domain of interest.
Deviation from stationarity is generated near the boundary of the region.
The choice of region and choice of triangulation affects the numerical accuracy.
Too fine meshes \(\rightarrow\) heavy computation
Too coarse mesh \(\rightarrow\) not accurate enough
Some guidelines
Create triangulation meshes with fm_mesh_2d():
edge length should be around a third to a tenth of the spatial range
Move undesired boundary effects away from the domain of interest by extending to a smooth external boundary:
Use a coarser resolution in the extension to reduce computational cost (max.edge=c(inner, outer)), i.e., add extra, larger triangles around the border
Use a fine resolution (subject to available computational resources) for the domain of interest (inner correlation range) and avoid small edges ,i.e., filter out small input point clusters (0 \(<\) cutoff \(<\) inner)
Coastlines and similar can be added to the domain specification in fm_mesh_2d() through the boundary argument.
simplify the border
We use the inla.spde2.pcmatern to define the SPDE model using PC priors through the following probability statements
\(P(\rho < 50) = 0.1\)
\(P(\sigma > 2) = 0.1\)
We start by plotting the distances and histogram of frequencies in distance intervals.
Then, we need to define a half-normal detection probability function. This must take distance as its first argument and the linear predictor of the sigma parameter as its second:
The LGCP Model
\[ \begin{aligned} p(\mathbf{y} | \lambda) & \propto \exp \left( -\int_\Omega \lambda(\mathbf{s}) p(\mathbf{s}) \mathrm{d}\mathbf{s} \right) \prod_{i=1}^n \lambda(\mathbf{s}_i) p(\mathbf{s}_i)) \\ \eta(s) & = \color{#FF6B6B}{\boxed{\beta_0}} + \color{#FF6B6B}{\boxed{ \omega(s)}} + \color{#FF6B6B}{\boxed{ \log p(s)}} \\ \end{aligned} \]
The code
# define model component
cmp = ~ Intercept(1) +
space(main = geometry, model = spde_model) +
sigma(1,
prec.linear = 1,
marginal = bru_mapper_marginal(qexp, pexp, dexp, rate = 1 / 8)
)
# define model predictor
eta = geometry + distance ~ space +
log(hn(distance, sigma)) +
Intercept + log(2)
# build the observation model
lik = bru_obs("cp",
formula = eta,
data = mexdolphin$points,
ips = ips)
# fit the model
fit = bru(cmp, lik)The LGCP Model
\[ \begin{aligned} p(\mathbf{y} | \lambda) & \propto \exp \left( -\int_\Omega \lambda(\mathbf{s}) p(\mathbf{s}) \mathrm{d}\mathbf{s} \right) \prod_{i=1}^n \lambda(\mathbf{s}_i) p(\mathbf{s}_i)) \\ \color{#FF6B6B}{\boxed{\eta(s)}} & = \color{#FF6B6B}{\boxed{\beta_0 + \omega(s) + \log p(s)}}\\ \end{aligned} \]
The code
# define model component
cmp = ~ Intercept(1) +
space(main = geometry, model = spde_model) +
sigma(1,
prec.linear = 1,
marginal = bru_mapper_marginal(qexp, pexp, dexp, rate = 1 / 8)
)
# define model predictor
eta = geometry + distance ~ space +
log(hn(distance, sigma)) +
Intercept + log(2)
# build the observation model
lik = bru_obs("cp",
formula = eta,
data = mexdolphin$points,
ips = ips)
# fit the model
fit = bru(cmp, lik)The LGCP Model
\[ \begin{aligned} \color{#FF6B6B}{\boxed{p(\mathbf{y} | \lambda)}} & \propto \exp \left( -\int_\Omega \lambda(\mathbf{s}) p(\mathbf{s}) \mathrm{d}\mathbf{s} \right) \prod_{i=1}^n \lambda(\mathbf{s}_i) p(\mathbf{s}_i)) \\ \eta(s) & = \beta_0 + \omega(s) + \log p(s) \\ \end{aligned} \]
The code
# define model component
cmp = ~ Intercept(1) +
space(main = geometry, model = spde_model) +
sigma(1,
prec.linear = 1,
marginal = bru_mapper_marginal(qexp, pexp, dexp, rate = 1 / 8)
)
# define model predictor
eta = geometry + distance ~ space +
log(hn(distance, sigma)) +
Intercept + log(2)
# build the observation model
lik = bru_obs("cp",
formula = eta,
data = mexdolphin$points,
ips = ips)
# fit the model
fit = bru(cmp, lik)The LGCP Model
\[ \begin{aligned} \color{#FF6B6B}{\boxed{p(\mathbf{y} | \lambda)}} & \propto \exp \left( -\int_\Omega \lambda(\mathbf{s}) p(\mathbf{s}) \mathrm{d}\mathbf{s} \right) \prod_{i=1}^n \lambda(\mathbf{s}_i) p(\mathbf{s}_i)) \\ \eta(s) & = \beta_0 + \omega(s) + \log p(s) \\ \end{aligned} \]
The code
# define model component
cmp = ~ Intercept(1) +
space(main = geometry, model = spde_model) +
sigma(1,
prec.linear = 1,
marginal = bru_mapper_marginal(qexp, pexp, dexp, rate = 1 / 8)
)
# define model predictor
eta = geometry + distance ~ space +
log(hn(distance, sigma)) +
Intercept + log(2)
# build the observation model
lik = bru_obs("cp",
formula = eta,
data = mexdolphin$points,
ips = ips)
# fit the model
fit = bru(cmp, lik)We can use the fit$summary.fixed and summary.hyperpar to obtain posterior summaries of the model parameters.
| mean | 0.025quant | 0.975quant | |
|---|---|---|---|
| Intercept | −8.41 | −9.47 | −7.62 |
| sigma | −0.05 | −0.46 | 0.36 |
| Range for space | 131.74 | 41.79 | 320.28 |
| Stdev for space | 1.17 | 0.72 | 1.78 |
We can use the fit$summary.fixed and summary.hyperpar to obtain posterior summaries of the model parameters.
| mean | 0.025quant | 0.975quant | |
|---|---|---|---|
| Intercept | −8.41 | −9.47 | −7.62 |
| sigma | −0.05 | −0.46 | 0.36 |
| Range for space | 131.74 | 41.79 | 320.28 |
| Stdev for space | 1.17 | 0.72 | 1.78 |
To map the spatial intensity we first need to define a grid of points where we want to predict.
fm_pixel() which creates a regular grid of points covering the meshpredict function which takes as input
fit)pxl)ggplot and add a gg() layer with your output of interest (E.g., pr.int$spatial)We can also use the predict function to predict the detection probabilities:
47 groups were seen. How many would be seen along the transects under perfect detection?
| mean | sd | q0.025 | q0.5 | q0.975 | median | sd.mc_std_err | mean.mc_std_err |
|---|---|---|---|---|---|---|---|
| 94.78 | 26.01 | 57.71 | 91.17 | 153.50 | 91.17 | 2.79 | 3.16 |
How many would be seen under perfect detection across the whole study area (i.e., the mean expected number of dolphins)?
| mean | sd | q0.025 | q0.5 | q0.975 | median | sd.mc_std_err | mean.mc_std_err |
|---|---|---|---|---|---|---|---|
| 305.59 | 70.92 | 191.91 | 303.77 | 461.18 | 303.77 | 4.96 | 8.08 |
What’s the predictive distribution of group counts?
We can also get Monte Carlo samples for the expected number of dolphins as follows:
Ns <- seq(50, 450, by = 1)
Nest <- predict(fit, predpts,
~ data.frame(
N = Ns,
density = dpois(
Ns,
lambda = sum(weight * exp(space + Intercept))
)
),
n.samples = 2000
)
Nest <- dplyr::bind_rows(
cbind(Nest, Method = "Posterior"),
data.frame(
N = Nest$N,
mean = dpois(Nest$N, lambda = Lambda$mean),
mean.mc_std_err = 0,
Method = "Plugin"
)
)pr.int1 <- predict(fit_hazard, pxl, ~ data.frame(spatial = space,
loglambda = Intercept + space,
lambda = exp(Intercept + space)))
distdf <- data.frame(distance = seq(0, 8, length.out = 100))
dfun1 <- predict(fit_hazard, distdf, ~ hr(distance, sigma))
ggplot() +
gg(pr.int1$loglambda, geom = "tile") +
scale_fill_scico(palette="imola",name=expression(log(lambda)))+
plot(dfun1)+plot_layout(ncol=2)Look at the goodness-of-fit of the two models in the distance dimension